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Some aspects of the predictability problem in dynamical systems are 
reviewed. The deep relation among Lyapunov exponents, Kolmogorov- 
Sinai entropy. Shannon entropy and algorithmic complexity is discussed. 
In particular, we emphasize how a characterization of the unpredictability 
of a system gives a measure of its complexity. A special attention is devoted 
to finite-resolution effects on predictability, which can be accounted with 
suitable generalization of the standard indicators. The problems involved 
in systems with intrinsic randomness is discussed, with emphasis on the 
important problems of distinguishing chaos from noise and of modeling the 
system. 
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All the simple systems are simple in the same way, each complex system has 
its own complexity (freely inspired by Anna Karenina by Lev N. Tolstoy) 



1. Introduction 

The possibility to predict future states of a system stands at the founda- 
tions of scientific knowledge with an obvious relevance both from a concep- 
tual and applicative point of view. The perfect knowledge of the evolution 
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law of a system may induce the conclusion that this aim could be attained. 
This classical deterministic point of view was claimed by Laplace once 
the evolution laws of the system are known, the state at a certain time to 
completely determines the subsequent states for every time t > to- However 
it is well established now that in some systems, full predictability cannot 
be accomplished in practice because of the unavoidable uncertainty in the 
initial conditions. Indeed, as already stated by Poincare, long-time predic- 
tions are reliable only when the evolution law does not amplify the initial 
uncertainty too rapidly. Therefore, from the point of view of predictability, 
we need to know how an error on the initial state of the system grows in 
time. In systems with great sensitive dependence on initial conditions (de- 
terministic chaotic systems) errors grows exponentially fast in time, limiting 
the ability to predict the future states. 

A branch of the theory of dynamical systems has been developed with 
the aim of formalizing and characterizing the sensitivity to initial condi- 
tions. The Lyapunov exponent and the Kolmogorov-Sinai entropy are the 
two main indicators for measuring the rate of error growth and informa- 
tion production during a deterministic system evolution. A complementary 
approach has been developed in the context of information theory, data 
compression and algorithmic complexity theory and it is rather clear that 
the latter point of view is closely related to the dynamical systems one. If 
a system is chaotic, then its predictability is limited up to a time which is 
related to the first Lyapunov exponent, and the time sequence by which we 
encode one of its chaotic trajectories cannot be compressed by an arbitrary 
factor, i.e. is algorithmically complex. On the contrary, the coding of a 
regular trajectory can be easily compressed (e.g., for a periodic trajectory 
it is sufficient to have the sequence for a period) so it is "simple" . 

In this paper we will discuss how unpredictability and algorithmic com- 
plexity are closely related and how information and chaos theory complete 
each other in giving a general understanding of complexity in dynamical 
processes. In particular, we shall consider the extension of this approach, 
nowadays well established in the context of low dimensional systems and 
for asymptotic regimes, to high dimensional systems with attention to situ- 
ations far from asymptotic (i.e. finite time and finite observational resolu- 
tion) P]. 

2. Two points of view 

2.1. Dynamical systems approach: Characteristic Lyapunov exponents 

The characteristic Lyapunov exponents are somehow an extension of the 
linear stability analysis to the case of aperiodic motions. Roughly speaking, 
they measure the typical rate of exponential divergence of nearby trajecto- 
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ries and, thus, contain information on the growing rate of a very smah error 
on the initial state of a system. 

Consider a dynamical system with an evolution law given, e.g., by the 
differential equation 

| = r(x); (2.1, 

we assume that F is smooth enough that the evolution is well-defined 
for time intervals of arbitrary extension, and that the motion occurs in 
a bounded region of the phase space. We intend to study the separation 
between two trajectories, x(t) and x'(t), starting from two close initial con- 
ditions, x(0) and x'(0) = x(0) -|- 5x(0), respectively. 

As long as the difference between the trajectories, (5x(t) = x'(t) — x(t), 
remains small (infinitesimal, strictly speaking), it can be regarded as a vec- 
tor, z{t), in the tangent space. The time evolution of z(t) is given by the 
linearized differential equations: 



dzi{t) _ dF\ 
dt ~ ^ dxj 



z,{t) . (2.2) 

x(t) 



Under rather general hypothesis, Oseledec j3| proved that for almost all 
initial conditions x(0), there exists an orthonormal basis {ej} in the tangent 
space such that, for large times, 

d 

z(t) = ^Cie,e^'*, (2.3) 

i=l 

where the coefficients {cj} depend on z(0). The exponents Ai > A2 > • • • > 
Xd are called characteristic Lyapunov exponents (LEs). If the dynamical 
system has an ergodic invariant measure, the spectrum of LEs {A,} does 
not depend on the initial condition, except for a set of measure zero with 
respect to the natural invariant measure. 

Equation 1)2. 3|) describes how a d-dimensional spherical region of the 
phase space, with radius e centered in x(0), deforms, with time, into an 
ellipsoid of semi-axes ei{t) = eexp(Ait), directed along the ej vectors. Fur- 
thermore, for a generic small perturbation (5x(0), the distance between the 
reference and the perturbed trajectory behaves as 

|5x(t)| ~ |5x(0)| ' [l + O (exp -(Ai - Aa)*)] . 

If Ai > we have a rapid (exponential) amplification of an error on the 
initial condition. In such the system is chaotic and, de facto, 

unpredictable on the long times. Indeed, if the initial error amounts to 
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6o = |(5x(0)|, and we purpose to predict the states of the system with a 
certain tolerance A (not too large), then the prediction is reliable just up 
to a predictability time given by 



This equation shows that Tp is basically determined by the largest Lyapunov 
exponent, since its dependence on 60 and A is logarithmically weak. Because 
of its preeminent role, Ai is often referred as "the Lyapunov exponent" , and 
denoted by A. 



In experimental investigations of physical processes, the access to a sys- 
tem occurs only through a measuring device which produces a time record 
of a certain observable, i.e. a sequence of data. In this regard a system, 
whether or not chaotic, generates messages and may be regarded as a source 
of information whose properties can be analysed through the tools of infor- 
mation theory. 

The characterization of the information contained in a sequence can be 
approached in two very different frameworks. The first considers a specific 
message (sequence) as belonging to the ensemble of all the messages that 
can be emitted by a source, and defines an average information content by 
means of the average compressibility properties of the ensemble The 
second considers the problem of characterizing the universal compressibility 
(i.e. ensemble independent) of a specific sequence and concerns the theory 
of algorithmic complexity and algorithmic information theory EI • For the 
sake of self-consistency we briefly recall the concepts and ideas about the 
Shannon entropy 4^, that is the basis of whole information theory 

2.2.1. Shannon entropy 

Consider a source that can output m different symbols; denote with st 
the symbol emitted by the source at time t and with P{Cn) the probability 
that a given word Cn = (si, S2, ■ ■ ■ , sn), of length N, is emitted P{Cn) = 
P(si, S2, • • • , Sat). We assume that the source is stationary, so that, for 
the sequences {st}, the time translation invariance holds: P{si, . . . , sn) = 
P(st+i, . . . , si+n)- We introduce the A^-block entropies 




(2.4) 



2.2. Information based approach 




(2.5) 



{Cn} 



for stationary sources the limit 




(2.6) 
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exists and defines the Shannon entropy hsh which quantifies the richness (or 
"complexity") of the source emitting the sequence. This can be precisely 
expressed by the first theorem of Shannon-McMillan [7] that applies to 
stationary ergodic sources: The ensemble of A^-long subsequences, when 
A*" is large enough, can be partitioned in two classes, ^}i{N) and 0,q{N) 
such that all the words Cn G have the same probability P{Cn) ~ 

ex.p{—Nhsh) and 

J2 P{Cn) 1 while Pi.CN) ^0 for iV ^ oo 

CjveQi(iV) C]veno(Af) 

(2.7) 

The meaning of this theorem is the following. An m-states process admits, 
in principle, possible sequences of length N . However the number of 
typical sequences, Neff{N), effectively observable (i.e. those belonging to 
Oi(iV)) is 

Neff{N) ^ exp{Nhsh) ■ (2.8) 

Note that N^jf <^ ra^ if hsh < Inm. The entropy per symbol, hsh, is 
a property of the source. However, because of the ergodicity hsh can be 
obtained by analyzing just one single sequence in the ensemble of the typical 
ones, and it can also be viewed as a property of each typical sequence. 

In information theory, expression (|2.8|) is somehow the equivalent of the 
Boltzmann equation in statistical thermodynamics: S oc InVF, being W the 
number of possible microscopic configurations and S the thermodynamic 
entropy, this justifies the name "entropy" for hsh- 

The relevance of the Shannon entropy in information theory is given 
by the fact that hsh sets the maximum compression rate of a sequence 
{si, S2, S3, . . .}. Indeed a theorem of Shannon states that, if the length T of 
a sequence is large enough, there exists no other sequence (always using m 
symbols), from which it is possible to reconstruct the original one, whose 
length is smaller than (/i5'/i/lnm)r :4. In other words, /15/j/lnm represents 
the maximum allowed compression rate. The relation between Shannon 
entropy and data compression problems is well illustrated by considering 
the optimal coding (Shannon-Fano) to map M objects (e.g. the A^-words 
Cat) into sequences of binary digits (0, 1) [S]. Denoting with the binary 
length of the sequence specifying Cjy, we have 

lim = , (2.9) 

A^-^oo iV In 2 ' ^ ' 

i.e., in a good coding, the mean length of a A^-word is equal to A'^ times the 
Shannon entropy, apart from a multiplicative factor, since in the definition 
(|2.6|) of hsh we used the natural logarithm and here we want to work with 
a two symbol code. 
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2.2.2. The Kolmogorov-Sinai entropy 

After the introduction of the Shannon entropy we can easily define the 
Kolmogorov-Sinai entropy which is the analogous measure of complexity 
applied to dynamical systems. Consider a trajectory, x(t), generated by a 
deterministic system, sampled at the times tj = jr, with j = 1,2,3,.... 
Perform a finite partition A of the phase space, with the finite number of 
symbols {s}j[ enumerating the cells of the partition. The time-discretized 
trajectory x{tj) determines a sequence {s(l), s(2), s(3), . . .}, whose meaning 
is clear: at the time tj the trajectory is in the cell labeled by s{j). To 
each subsequence of length N ■ t one can associate a word of length N: 
Wj^{A) = (s(j), s{j + 1), . . . , s{j + {N — 1))). If the system is ergodic, as 
we suppose, from the frequencies of the words one obtains the probabilities 
by which the block entropies Hiy{A) are calculated: 

Hn{A) = - J2 P{W^{A))\nP{W^{A)). (2.10) 

{W^{A)} 

The probabilities P{W^ (A)), computed by the frequencies of W^{A) along 
a trajectory, are essentially dependent on the stationary measure selected 
by the trajectory. The entropy per unit time of the trajectory with respect 
to the partition A, h(A), is defined as follows: 

hN{A) = - lim ^^Hn{A) . (2.11) 

T N^oo iV 

Notice that, for the deterministic systems we are considering, the entropy 
per unit time does not depend on the sampling time r [0]. The KS-entropy 
(hKs), by definition, is the supremum of h{A) over all possible finite parti- 
tions mini 

hxs = suj)h{A). (2.12) 
A 

The extremal character of h^s makes every computation based on the def- 
inition ()2.12|) . impossible in the majority of practical cases. In this respect, 
a useful tool would be the Kolmogorov-Sinai theorem, through which one 
is granted that hxs = h{Q) if ^ is a generating partition. A partition is 
said to be generating if every infinite sequence {sn}n=i,...,oo corresponds to a 
single initial point. However the difficulty now is that, with the exception of 
very simple cases, we do not know how to construct a generating partition. 
We only know that, according to the Krieger theorem [TT], there exists a 
generating partition with k elements such that e'^^-^ < k < e^^^ + 1. Then, 
a more tractable way to define Hks is based upon considering the partition 
Ae made up by a grid of cubic cells of edge e, from which one has 



hxs = lim/i(A) 

e— >0 



(2.13) 
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We expect that h{Ae) becomes independent of e when Ae is so fine to be 
"contained" in a generating partition. 

For discrete time maps what has been exposed above is stih vahd, with 
r = 1 (however, Krieger's theorem only apphes to invertible maps). 

The important point to note is that, for a truly stochastic (i.e. non- 
deterministic) system, with continuous states, h{Ae) is not bounded and 
hKS = oo. 

2.2.3. Algorithmic complexity 

The Shannon entropy establishes a limit on how efficiently the ensemble 
of messages emitted by a source can be coded. However, we may wonder 
about the compressibility properties of a single sequence with no reference to 
its belonging to an ensemble. That is to say, we are looking for an universal 
characterization of its compressibility or, it is the same, an universal defini- 
tion of its information content. This problem can be addressed through the 
notion of algorithmic complexity, that concerns the difficulty in reproducing 
a given string of symbols. 

Everybody agrees that the binary digits sequence 

0111010001011001011010... (2.14) 

is, in some sense, more random than 

1010101010101010101010... (2.15) 

The notion of algorithmic complexity, independently introduced by Kol- 
mogorov 0, Chaitin and Solomonov is a way to formalize the 
intuitive idea of randomness of a sequence. 

Consider, for instance, a binary digit sequence (this does not consti- 
tute a limitation) of length N, qj\f = {ii,i2, ■ ■ ■ , ^at), generated by a certain 
computer code on a given machine M. The algorithmic complexity (or algo- 
rithmic information content) Kj^(N) of is the bit-length of the shortest 
computer program able to give qn and to stop afterward. Of course, such 
a length depends not only on the sequence but also on the machine. How- 
ever, Kolmogorov i5 proved the existence of a universal computer, U, able 
to perform the same computation that a program p makes on with a 
modification of p that depends only on M. This implies that for all finite 
strings: 

Ku{N) < Km{N) + Cm , (2.16) 

where Kk{N) is the complexity with respect to the universal computer and 
Cm depends only on the machine A4. We can consider the algorithmic com- 
plexity with respect to a universal computer dropping the 7W-dependence 
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in the symbol for the algorithmic complexity, K{N). The reason is that we 
are interested in the limit of very long sequences, — > oo, for which one 
defines the algorithmic complexity per unit symbol: 

C=lim^. (2.17) 

that, because of (|2.1(ij) . is an intrinsic quantity, i.e. independent of the 
machine. 

Now coming back to the A^-sequences ()2.14() and (|2.15() . it is obvious 
that the latter can be obtained with a minimal program of length 0(ln A^) 
and therefore when taking the limit ^ oo in ()2.17|) . one obtains C = 0. Of 
course K{N) cannot exceed A^, since the sequence can always be generated 
by a trivial program (of bit length N) 

"PRINT ii, i2, ^7v" • (2.18) 

Therefore, in the case of a very irregular sequence, e.g., 1)2. 14(1 . one expects 
K{N) oc N (i.e. C 7^ 0), and the sequence is named complex (i.e. of non 
zero algorithmic complexity) or random. 

Algorithmic complexity cannot be computed, and the un-computability 
of K(N) may be understood in terms of Godel's incompleteness theorem 
jl2j . Beyond the problem of whether or not K(N) is computable in a 
specific case, the concept of algorithmic complexity brings an important 
improvement to clarify the vague and intuitive notion of randomness. 

Between the Shannon entropy, hsh, and the algorithmic complexity, 
there exists the straightforward relationship 

(K(N)) 1 
lim ^ ^ ^' = , (2.19) 

where {K{N)) = J^Cn PiCN)Kcj,{N), being Kc^{N) the algorithmic com- 
plexity of the A^-words, in the ensemble of sequences. Cat, with a given 
distribution of probabilities, P{Cn)- Therefore the expected complexity 
{K{N)/N) is asymptotically equal to the Shannon entropy (modulo the 
In 2 factor). It is important to stress again that, apart from the numerical 
coincidence of the values of C and /i5h/ln2, there is a conceptual difference 
between the information theory and the algorithmic complexity theory. The 
Shannon entropy essentially refers to the information content in a statisti- 
cal sense, i.e. it refers to an ensemble of sequences generated by a certain 
source. The algorithmic complexity defines the information content of an 
individual sequence ^3J . 

The notion of algorithmic complexity can be also applied to the trajec- 
tories of a dynamical system. This requires the introduction of finite open 
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coverings of the phase space, the corresponding encoding of trajectories into 
symbohc sequences, and the searching of the supremum of the algorithmic 
complexity per symbol at varying the coverings ^1]. Brudno's and White's 
theorems jl51 116j state that the complexity C(x) for a trajectory starting 
from the point x, is 

C(x) = ^ , (2.20) 

for almost all x with respect to the natural invariant measure. The factor 
In 2 stems again from the conversion between natural logarithms and bits. 

This result indicates that the KS-entropy quantifies not only the rich- 
ness of a dynamical system but also the difficulty of describing its typical 
sequences. 

2.3. Algorithmic complexity and Lyapunov Exponent 
Let us consider a Id chaotic map 



x(t + l) = f{x(t)). (2.21) 

The transmission of the sequence {x{t), t = 1,2, ...,T}, accepting only 
errors smaller than a tolerance A, is carried out by using the following 
strategy [TK| : 

1. Transmit the rule 1)2.21(1 : for this task one has to use a number of bits 
independent of the sequence length T. 



2. Specify the initial condition x(0) with a precision 6o using a finite 
number of bits which is independent of T. 

3. Let the system evolve till the first time ri such that the distance 
between two trajectories, that was initially 5x{0) = Sq, equals A and 
then specify again the new initial condition x{ti) with precision 6o. 

4. Let the system evolve and repeat the procedure (2-3), i.e. each time 
the error acceptance tolerance is reached specify the initial conditions, 
x{ti + T2), x{ti + T2 + T3) . . ., with precision Sq. The times ti,T2, . . . 
are defined as follows: putting x (ti) = x{ti) + Sq, T2 is given by the 
minimum time such that \x {ti + T2) — x{ti + T2)\ > A and so on. 



Following the steps (1 — 4), the receiver can reconstruct, with a precision 
A, the sequence {x{t)}, by simply iterating on a computer the evolution 
law (|2.21() between 1 and ri — 1, ri and ti + T2 — 1, and so on. The amount 
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of bits necessary to implement the above transmission (1-4) can be easily 
computed. For simplicity of notation we introduce the quantities 

7i = - In ^ (2.22) 

n Oo 

which can be regarded as a sort of effective Lyapunov exponents [2L)l 119) . 
The LE A can be written in terms of {72} as follows 

A = (7.) = ^^ = =ln|^ (2.23) 

where _^ 

is the average time after which we have to transmit the new initial condition. 
Note that to obtain A from the 7i's requires the average (|2.23|) . because the 
transmission time, Tj, is not constant. If T is large enough the number of 
transmissions, is T/r ~ AT/ln(A/5o). Therefore, noting that in each 
transmission, a reduction of the error from A to 5q requires the employ of 
ln2(A/5o) bits, the total amount of bits used in the transmission is 

^ln,^ = AT. (2.24) 
r oq In 2 

In other words the number of bits for unit time is proportional to A. 

In more than one dimension, we have simply to replace A with hxs in 
1)2. 24() . because the above transmission procedure has to be repeated for 
each of the expanding directions. 



3. Limitation of the Lyapunov exponent and Kolmogorov-Sinai 

entropy 

Lyapunov exponents and KS-entropy are properly defined only in spe- 
cific asymptotic limits: very long times and arbitrary accuracy. However, 
predictability problem in realistic situations entails considering finite time 
intervals and limited accuracy. The first obvious way for quantifying the 
predictability of a physical system is in terms of the predictability time Tp, 
i.e. the time interval on which one can typically forecast the system. A 
simple argument suggests 

T,^i.„(^). (3.1) 

However, the above relation is too naive to be of practical relevance, in any 
realistic system. Indeed, it does not take into account some basic features 



polonica printed on February 4, 2008 



11 



of dynamical systems. The Lyapunov exponent is a global quantity, be- 
cause it measures the average rate of divergence of nearby trajectories. In 
general there exist finite-time fiuctuations and their probability distribution 
functions (pdf) is important for the characterization of predictability. The 
generalized Lyapunov exponents have been introduced with the purpose to 
take into account such fluctuations [201 Ej. Moreover, the Lyapunov expo- 
nent is defined for the linearized dynamics, i.e., by computing the rate of 
separation of two infinitesimally close trajectories. On the other hand, in 
measuring the predictability time 1)3. 1|) one is interested in a finite tolerance 
A, because the initial error 5q is finite. A recent generalization of the Lya- 
punov exponent to finite size errors extends the study of the perturbation 
growth to the nonlinear regime, i.e. both 5q and A are not infinitesimal 

EH- 

3.1. Growth of non infinitesimal perturbations 

We discuss now an example where the Lyapunov exponent is of little rel- 
evance for characterizing the predictability. This problem can be illustrated 
by considering the following coupled map model: 

r x(t + l) =Rx(t) + eh(y(t)) 
1 y{t + l) =G{y{t)), 

where x G IR^, y G IR^, R is a rotation matrix of arbitrary angle ^, h is a 
vector function and G is a chaotic map. For simplicity we consider a linear 
coupling h(y) = {y,y) and the logistic map G{y) = 4y(l — y). 

For e = we have two independent systems: a regular and a chaotic 
one. Thus the Lyapunov exponent of the x subsystem is Xx{£ = 0) = 0, i.e., 
it is completely predictable. On the contrary, the y subsystem is chaotic 
with Ay = Ai = In 2. The switching on of a small coupling (e > 0) yields 
a single three-dimensional chaotic system with a positive global Lyapunov 
exponent 

\ = Xy + 0{e) . (3.3) 
A direct application of 1)3. If) would give 

Ti'^ -T.r.^, (3.4) 

Ay 

but this result is clearly unacceptable: the predictability time for x seems to 
be independent of the value of the coupling e. This is not due to an artifact 
of the chosen example, indeed, the same argument applies to many physical 
situations j22j . A well known example is the gravitational three body prob- 
lem, with one body (asteroid) much smaller than the other two (planets). 
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Fig. 1. Growth of error |(^x(t)| for the coupled map 1)^12(1 . The rotation angle is 
= 0.82099, the coupling strength e = 10~^ and the initial error only on the y 
variable is 6y = 6q = 10^^°. Dashed line |<5x(i)| ^ e'^i* where Ai = In 2, solid line 
|(5x(i)| -t^/". 

When the gravitational feedback of the asteroid on the two planets is ne- 
glected (restricted problem), one has a chaotic asteroid in the regular field 
of the planets. As soon as the feedback is taken into account (i.e. e > 
in the example) one has a non-separable three body system with a positive 
LE. Of course, intuition correctly suggests that, in the limit of small aster- 
oid mass (e — > 0), a forecast of the planet motion should be possible even 
for very long times. The apparent paradox arises from the misuse of for- 
mula p.lj) . strictly valid for tangent vectors, to the case of non infinitesimal 
regimes. As soon as the errors become large, the full nonlinear evolution 
of the three body system has to be taken into account. This situation is 
clearly illustrated by the model (|3.2|) in Figure ^ The evolution of 5x is 
given by 

5^{t + 1) = R5x(t) + e5h{y) , (3.5) 

where, with our choice, 5\i = {6y,6y). At the beginning, both \Sx.\ and 
6y grow exponentially. However, the available phase space for y is finite 
and the uncertainty reaches the saturation value Sy ~ 0(1) in a time t ~ 
1/Ai. At larger times the two realizations of the y variable are completely 
uncorrelated and their difference Sy in 1)3. 5() acts as a noisy term. As a 
consequence, the growth of the uncertainty on x becomes diffusive with a 
diffusion coefficient proportional to 




(3.6) 
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so that: 

Tj(^) ~ e-^ . (3.7) 

This example shows that, even in simple systems, the Lyapunov expo- 
nent can be of little relevance for the characterization of the predictability. 

In more complex systems, in which different scales are present, one is 
typically interested in forecasting the large scale motion, while the LE is 
related to the small scale dynamics. A familiar example of that is weather 
forecast: despite the LE of the atmosphere is indeed rather large, due to the 
small scale convective motion, large-scale weather predictions are possible 
for about 10 days |23[ I24j . It is thus natural to seek for a generalization of 
the LE to finite perturbations from which one can obtain a more realistic 
estimation for the predictability time. It is worth underlining the important 
fact that finite errors are not confined in the tangent space but are governed 
by the complete nonlinear dynamics. In this sense the extension of the LE 
to finite errors will give more information on the system. 

Aiming to generalize the LE to non infinitesimal perturbations let us 
now define the Finite Size Lyapunov Exponent (FSLE) [2^. Consider a 
reference x(t) and a perturbed trajectory x {t), such that |x (0) — x(0)| ~ 5. 
One integrates the two trajectories and computes the time ti{5, r) necessary 
for the separation |x'(t) — x(t)| to grow from 5 to r5. At time t = Ti{5,r) 
the distance between the trajectories is rescaled to 6 and the procedure is 
repeated in order to compute T2{5, r), r3(5, r) . . .. 

The threshold ratio r must be r > 1, but not too large in order to avoid 
contributions from different scales in r((5, r). A typical choice is r = 2 (for 
which r((5, r) is properly a "doubling" time) or r = ^/2. In the same spirit 
of the discussion leading to Eq.s (|2.22() and (|2.23l) . we may introduce an 
effective finite size growth rate: 

7.(5,r) = -i-ylnr. (3.8) 

After having performed M error-doubling experiments, we can define 
the FSLE as 

A(5) = (7(5,r))t = /-1^\ lnr= / Inr, (3.9) 
where (t(5, r))e is 

1 ^ 

(r(5,r))e = -ET„(<^,r), (3.10) 

71=1 

see j25| for details. In the infinitesimal limit, the FSLE reduces to the 
standard Lyapunov exponent 

limA(5) = Ai. (3.11) 
<5^0 
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Fig. 2. X{S) as a function of S for the coupled map H3.2|l with e = 10^^. The 
perturbation has been initiahzed as in Fig. ^ For 6-^0, X{S) ~ Ai (horizontal 
line). The dashed line shows the behavior X{6) ^ S^'^. 

In practice this limit means that X{6) displays a constant plateau at Ai 
for sufficiently small 6 (Fig. For finite value of 5 the behavior of X{5) 
depends on the details of the non linear dynamics. For example, in the 
model (|3.2() the diffusive behavior 1)3. 6() . by simple dimensional arguments, 
corresponds to X{6) ~ 5"^. Since the FSLE measures the rate of divergence 
of trajectories at finite errors, one might wonder whether it is just another 
way to look at the average response (ln(|x'(t) — x(t)|)) as a function of 
time. The answer is negative, because taking the average at fixed time is 
not the same as computing the average doubling time at fixed scale, as in 
(|3.9|) . This is particularly clear in the case of strongly intermittent system, 
in which |(5x(t)| can be very different in each realization. In the presence of 
intermittency, averaging over different realizations at fixed times can pro- 
duce a spurious regime due to the superposition of exponential and diffusive 
contributions by different samples at the same time . The FSLE method 
can be easily applied to data analysis |26] . For other approaches addressing 
the problem of non-infinitesimal perturbations see ^28. .27] . 

3.2. The e-entropy 

For most systems, the computation of Kolmogorov-Sinai entropy 1)2. 12() 
is practically impossible, because it involves the limit on arbitrary fine res- 
olution and infinite times. However, in the same philosophy of the FSLE, 
by relaxing the requirement of arbitrary accuracy, one can introduce the e- 
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entropy which measures the amount of information for reproducing a trajec- 
tory with finite accuracy e in phase-space. Roughly speaking the e-entropy 
can be considered the counterpart, in information theory, of the FSLE. Such 
a quantity was originahy introduced by Shannon and by Kolmogorov 
j29j . Recently Gaspard and Wang [30] made use of this concept to charac- 
terize a large variety of processes. 

We start with a continuous-time variable x(f) G IR'^, which represents 
the state of a d-dimensional system, we discretize the time by introducing 
an interval r and we consider the new variable 

X(^'")(t) = (x(t),x(t+r),...,x(f + (m-l)T)). (3.12) 

Of course X*^™'^(t) G IR™"^ and it corresponds to the trajectory which lasts 
for a time T = rriT. 

In data analysis, the space where the state of the system lives is un- 
known and usually only a scalar variable u{t) can be measured. Then, one 
considers vectors {u{t),u{t + r), . . . , u{t + mr — r)), that live in IR™" and al- 
low a reconstruction of the original phase space, known as delay embedding 
in the literature ESI IHlj ; and it is a special case of (|3.12|) . Introduce 
now a partition of the phase space JW^, using cells of edge e in each of the 
d directions. Since the region where a bounded motion evolves contains a 
finite number of cells, each X(™)(t) can be coded into a word of length m, 
out of a finite alphabet: 

X(™)(t) ^VF™(e,t) = (i(e,t),f(e,t + r),...,i(e,t + mT-r)), (3.13) 

where i{e,t + jr) labels the cell in JW^ containing x(t + jr). From the time 
evolution one obtains, under the hypothesis of ergodicity, the probabilities 
P(W'^{e)) of the admissible words {W"^{e)}. We can now introduce the 
(e, r)-entropy per unit time, h{e,T) 0]: 

hie,T) = - lim -Hmie,T), (3.14) 

where Hm is the block entropy of blocks (words) with length m: 

H^{e,T) = - J2 m"(e))lnP(iy'"(e)). (3.15) 

{H^™{e)} 

For the sake of simplicity, we ignored the dependence on details of the par- 
tition. To make h{e, r) partition-independent one has to consider a generic 
partition of the phase space {A} and to evaluate the Shannon entropy on 
this partition: hshiA, r). The e-entropy is thus defined as the infimum over 
all partitions for which the diameter of each cell is less than e j3Uj : 



(3.16) 
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Note that the time dependence in is trivial for deterministic systems, 

and that in the hmit e ^ one recovers the Kolmogorov-Sinai entropy 

hKS = hm/i(e,r). 

4. Characterization of Complexity and system modeling 

In the previous Sections, we discussed the characterization of dynamical 
behaviors when the evolution laws are known either exactly or with some 
degree of uncertainty. In experimental investigations, however, only time 
records of some observable are available, while the equations of motion for 
the observable are generally unknown. The predictability problem of this 
latter case, at least from a conceptual point of view, can be treated as if the 
evolution laws were known. Indeed, in principle, the embedding technique 
allows for a reconstruction of the phase space |^ OSl IHH- Nevertheless 
there are rather severe limitations for high dimensional systems |34| and 
even in low dimensional ones non trivial features appear in the presence of 
noise |32j . In this Section we show that an entropic analysis at different 
resolution scales provides a pragmatic classification of a signal and gives 
suggestions for modeling of systems. In particular we illustrate, using some 
examples, how quantities such as the e-entropy or the FSLE can display a 
subtle transition from the large to the small scales. A negative consequence 
of this is the difficulty in distinguishing, only from data analysis, a genuine 
deterministic chaotic system from one with intrinsic randomness [231 • On 
the other hand, the way the e-entropy or FSLE depends on the (resolution) 
scale, allows for a classification of the stochastic or chaotic character of a 
signal, and this gives some freedom in modeling the system. 

^.1. How random is a random number generator? 

The "true character" of the number sequence {xi,X2, • • •) obtained by a 
(pseudo) random number generator (PRNG) on a computer is an issue of 
paramount importance in computer simulations and modeling. One would 
like to have a sequence with a random character as much as possible, but 
is forced to use deterministic algorithms to generate (xi,X2, . . .). This sub- 
section is mainly based on the paper [HSj. A simple and popular PRNG is 
the multiplicative congruent one: 

Zn+i = NiZn mod N2 

with an integer multiplier A^i and modulus The {zn} are integer num- 
bers from which one hopes to generate sequence of random variables 



(4.1) 
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which are uncorrelated and uniformly distributed in the unit interval. A 
first problem arises from the periodic nature of the rule ()4.1|) as a conse- 
quence of its discrete nature. Note that the rule ()4.1|) can be interpreted 
also deterministic dynamical system, i.e. 

Xn+i = NiXn mod 1 , (4.2) 

which has a uniform invariant measure and a KS entropy Kks = ^ = In A'^i . 
When imposing the integer arithmetics of Eq. 1)4. 1|) onto this system, we 
are, in the language of dynamical systems, considering an unstable periodic 
orbit of Eq. (|4.2|) . with the particular constraint that, to achieve the period 
N2 — 1 (i.e. all integers < N2 should belong to the orbit of Eq. ()4.1() 1. 
it has to contain all values k/N2, with k = 1,2, ■ ■ ■ , N2 — 1. Since the 
natural invariant measure of Eq. (|4.2|) is uniform, such an orbit represents 
the measure of a chaotic solution in an optimal way. Every sequence of 
a PRNG is characterized by two quantities: its period T and its positive 
Lyapunov exponent A, which is identical to the entropy of a chaotic orbit 
of the equivalent dynamical system. Of course a good random number 
generator must have a very large period, and as large as possible entropy. 

It is natural to ask how this apparent randomness can be reconciled with 
the facts that (a) the PRNG is a deterministic dynamical systems (b) it is 
a discrete state system. If the period is long enough, on shorter times only 
point (a) matters and it can be discussed in terms of the behavior of the e- 
entropy, /i(e). At high resolutions (e < 1/A^i), it seems rather reasonable to 
think that the true deterministic chaotic nature of the congruent rule shows 
up, and, therefore, h{e) ~ hxs = InA^i. On the other hand, for e > 1/Ni, 
one expects to observe the "apparent random" behavior of the system, i.e. 
h{e) ~ ln(l/e), see Fig El 



4.2. High dimensional systems 

We discuss an example of high-dimensional system with a non-trivial 
behavior at varying the resolution scales, namely the emergence of nontrivial 
collective behavior. 

Let us consider a globally coupled map (GCM) defined as follows 

N 



Xn{t + 1) = (1 - e)fa{Xn{t)) + - ^ fa{Xi{t)), (4.3) 



where N is the total number of elements, and fa{u) is a chaotic map on the 
interval [0, 1], depending on the control parameter o. 

The evolution of a macroscopic variable, e.g., the center of mass 

N 



1 

i=l 
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Fig. 3. The e-entropies, hm{e), at varying the embedding dimension m for the 
multiphcative congruential random number generator Eq. 14.11 for different choices 
of iVi and N2. 

upon varying e and a in Eq. (|4.3|) . displays different behaviors [HSj : 

(a) Standard Chaos: m{t) obeys a Gaussian statistics witli a standard 
deviation cjn = ^ {m{tY) - {m{t))'^ ~ iV^/^; 

(b) Macroscopic Periodicity: m(t) is a superposition of a periodic function 

and small fluctuations 0{N~^^'^); 

(c) Macroscopic Chaos: m{t) exhibits an irregular motion, as seen by plot- 

ting m{t) vs. m{t — 1). The plot sketches a structured function (with 
thickness ^ N^^^'^), and suggests a chaotic motion for m{t). 

In the case of macroscopic chaos, the center of mass is expected to evolve 
with typical times longer than the characteristic time 1/Ai of the full dy- 
namics (microscopic dynamics); Ai being the Lyapunov exponent of the 
GCM. Indeed, conceptually, macroscopic chaos for GCM can be thought of 
as the analogous of the hydro-dynamical chaos for molecular motion. In 
spite of a huge microscopic Lyapunov exponent (Ai ~ 1/tc ~ 10^^ Tc 
is the collision time), one can have rather different behaviors at a hydro- 
dynamical (coarse grained) level: regular motion {Xhydro < 0) or chaotic 
motion (0 < Xhydro ^ Ai). In principle, if the hydrodynamic equations 
were known, a characterization of the macroscopic behavior would be possi- 
ble by means of standard dynamical system techniques. However, in generic 
CML there are no general systematic methods to build up the macroscopic 
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equations, apart from particular cases fB3- We recall that for chaotic sys- 
tems, in the limit of infinitesimal perturbations (5 — > 0, one has \{5) — > Ai, 
i.e. \{5) displays a plateau at the value Ai for sufficiently small 5. How- 
ever, for non infinitesimal (5, one can expect that the 5-dependence of \{5) 
may give information on the characteristic time-scales governing the sys- 
tem, and, hence, it could be able to characterize the macroscopic motion. 
In particular, at large scales {6 ^ l/\/iV)j the fast microscopic components 
saturate and \{5) « Aa/, where Xm can be fairly called the "macroscopic" 
Lyapunov exponent. 

The FSLE has been determined by looking at the evolution of |(5m(t)|, 
which has been initialized at the value 5m{t) = 5min by shifting all the 
elements of the unperturbed system by the quantity 5min (i-e. x[{0) = 
Xi{0) + 5min)i for cach realization. The computation has been performed by 
choosing the tent map as local map, but similar results can be obtained for 
other maps PHl 05] . 

The main result can be summarized as follows: 

• at small 6 l/\/iV), where is the number of elements, the "mi- 
croscopic" Lyapunov exponent is recovered, i.e. X{6) Xmicro 

• at large 6 1/^/N), another plateau X{6) Xmacro appears, which 
can be much smaller than the microscopic one. 

The emerging scenario is that, at a coarse-grained level, i.e. 6 S> I/^/N, the 
system can be described by an "effective" hydro-dynamical equation (which 
in some cases can be low-dimensional), while the "true" high-dimensional 
character appears only at very high resolution, i.e. 

S<5c = (^] . 

\VnJ 



4-3. Diffusion in deterministic systems and Brownian motion 

Consider the following map which generates a diffusive behavior on the 
large scales (40; : 

xt+i = [xt] + F {xt - [xt]) , (4.5) 
where [xt] indicates the integer part of xt and F(y) is given by: 

^^y' - \ {2 + a)y-il + a) if y e [1/2, 1] . ^^-^^ 

The largest Lyapunov exponent A can be obtained immediately: A = In 
with F' = dF/dy =2+a. One expects the following scenario for h{e): 

h{e) PS A for e < 1, (4.7) 
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Fig. 4. The map F{x) (|4.6|l for a ~ 0.4 is shown with superimposed the approxi- 
mating (regular) map G{x) H4.9|) obtained by using 40 intervals of slope 0. 

h(e) oc ^ for e > 1, (4.8) 

where D is the diffusion coefficient, {{xt — xq)'^) ^ 2 D t for large t. 
Consider now a stochastic system, namely a noisy map 

xt+i = [xt] + G {xt - [xt]) + ar]t, (4.9) 

where G{y), as shown in Fig. [IJ is a piece wise linear map which approxi- 
mates the map F{y), and rjt is a stochastic process uniformly distributed in 
the interval [—1, 1], and no correlation in time. When \dG/dy\ < 1, as is the 
case we consider, the map ()4.9() . in the absence of noise, gives a non-chaotic 
time evolution. 

Now we compare the finite size Lyapunov exponent for the chaotic map 
(|4.5|) and for the noisy one 1)4. 9() . In the latter the FSLE has been computed 
using two different realizations of the noise. In Fig. [SJwe show A(e) versus 
e for the two cases. The two curves are practically indistinguishable in the 
region e > a. The differences appear only at very small scales e < a where 
one has a A(e) which grows with e for the noisy case, remaining at the same 
value for the chaotic deterministic case. 

Both the FSLE and the (e, r)-entropy analysis show that we can distin- 
guish three different regimes observing the dynamics of (|4.9p on different 
length scales. On the large length scales e > 1 we observe diffusive behavior 
in both models. On length scales a < e < 1 both models show chaotic 
deterministic behavior, because the entropy and the FSLE are independent 
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Fig. 5. Lyapunov exponent A(e) versus e obtained for the map Fijj) (|4.6(l with 
a — 0.4 (o) and for the noisy (regular) map H4.9|) (□) with 10^ intervals of slope 
0.9 and a = 10""*. Straight lines indicate the Lyapunov exponent A = In 2.4 and 
the diffusive behavior A(e) ~ 



of e and larger than zero. Finally on the smallest length scales e < a we 
see stochastic behavior for the system 1)4. 9|1 . i.e. h{£) ~ — ln(e), while the 
system (|4.5p still shows chaotic behavior. 

^.4. On the distinction between chaos and noise 

The above examples show that the distinction between chaos and noise 
can be a highly non trivial task, which makes sense only in very peculiar 
cases, e.g., very low dimensional systems. Nevertheless, even in this case, 
the entropic analysis can be unable to recognize the "true" character of 
the system due to the lack of resolution. Again, the comparison between 
the diffusive map (|4.5|) and the noisy map H4.9() is an example of these 
difficulties. For a < e <1 both the system (|4.5() and H4.9() . in spite of their 
"true" character, will be classified as chaotic, while for e > 1 both can be 
considered as stochastic. 

In high-dimensional chaotic systems, with N degrees of freedom, one has 
typically h{e) = Kks ~ 0{N) for e < ec (where ec — > as ^ co) while 
for e > ec, h{e) decreases, often with a power law [30]. Since also in some 
stochastic processes the e-entropy obeys a power law, this can be a source 
of confusion. 

These kind of problems are not abstract ones, as a recent debate on 
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"microscopic chaos" demonstrates |4H I42 ( EH]. The detection of microscopic 
chaos by data analysis has been recently addressed in a work of Gaspard et 
al. These authors, from an entropic analysis of an ingenious experiment 
on the position of a Brownian particle in a liquid, claim to give an empirical 
evidence for microscopic chaos. In other words, they state that the diffusive 
behavior observed for a Brownian particle is the consequence of chaos at a 
molecular level. Their work can be briefly summarized as follows: from a 
long (~ 1.5 X 10^ data) record of the position of a Brownian particle they 
compute the e-entropy with the Cohen-Procaccia method |44j from which 
they obtain: 



where D is the diffusion coefficient. Then, assuming that the system is 
deterministic, and making use of the inequality h{e > 0) < hxs^ they 
conclude that the system is chaotic. However, their result does not give a 
direct evidence that the system is deterministic and chaotic. Indeed, the 
power law 1)4.10(1 can be produced with different mechanisms: 

1. a genuine chaotic system with diffusive behavior, as the map ()4.6|) : 

2. a non chaotic system with some noise, as the map (|4.9p . or a genuine 
Brownian system; 

3. a deterministic linear non chaotic system with many degrees of free- 
dom (see for instance 45 J; 

4. a "complicated" non chaotic system as the Ehrenfest wind-tree model 
where a particle diffuses in a plane due to collisions with randomly 
placed, fixed oriented square scatters, as discussed by Cohen et al. 
j42j in their comment to Ref. f2T! . 

It seems to us that the weak points of the analysis in Ref. j41j are: 

a) the explicit assumption that the system is deterministic; 

b) the limited number of data points and therefore limitations in both 

resolution and block length. 

The point (a) is crucial, without this assumption (even with an enormous 
data set) it is not possible to distinguish between 1) and 2). One has to say 
that in the cases 3) and 4) at least in principle it is possible to understand 
that the systems are "trivial" (i.e. not chaotic) but for this one has to 
use a huge number of data. For example Cohen et al. ^21 estimated that 
in order to distinguish between 1) and 4) using realistic parameters of a 
typical liquid, the number of data points required has to be at least ~ 10^*^. 




(4.10) 
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Concluding, we have the apparently paradoxical result that "complex- 
ity" helps in the construction of models. Basically, in the case in which one 
has a variety of behaviors at varying the scale resolution, there is a certain 
freedom on the choice of the model to adopt. For some systems the behav- 
ior at large scales can be realized both with chaotic deterministic models or 
suitable stochastic processes. From a pragmatic point of view, the fact that 
in certain stochastic processes h[e) ~ e~° can be indeed extremely useful 
for modeling such high-dimensional systems. Perhaps, the most relevant 
case in which one can use this freedom in modeling is the fully developed 
turbulence whose non infinitesimal (the so-called inertial range) properties 
can be successfully mimicked in terms of multi-afhne stochastic process (see 



The guideline of this paper has been the interpretation of different as- 
pects of the predictability of a system as a way to characterize its complexity. 

We have discussed the relation between chaoticity, the Kolmogorov-Sinai 
entropy and algorithmic complexity. As clearly exposed in the seminal works 
of Alekseev and Yakobson Jl] and Ford JZj, the time sequences generated 
by a system with sensitive dependence on initial conditions have non-zero 
algorithmic complexity. A relation exists between the maximal compression 
of a sequence and its KS-entropy. Therefore, one can give a definition 
of complexity, without referring to a specific description, as an intrinsic 
property of the system. 

The study of these different aspects of predictability constitutes a useful 
method for a quantitative characterization of "complexity" , suggesting the 
following equivalences: 



The above point of view, based on dynamical systems and information the- 
ory, quantifies the complexity of a sequence considering each symbol relevant 
but it does not capture the structural level. Let us clarify this point with 
the following example. A binary sequence obtained with a coin tossing is, 
from the point of view adopted in this review, complex since it cannot be 
compressed (i.e. it is unpredictable). On the other hand such a sequence 
is somehow trivial, i.e. with low "organizational" complexity. It would be 
important to introduce a quantitative measure of this intuitive idea. The 
progresses of the research on this intriguing and difficult issue are still rather 
slow. We just mention some of the most promising proposals as the logical 
depth and the sophistication |47j . 



Ref. Us]). 



5. Concluding Remarks 



Complex = Uncompressible = Unpredictable 
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